Computing Idioms Frequency in Text Corpora
نویسنده
چکیده
The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملTowards automatic retrieval of idioms in
The goal of this paper is to present a procedure for the automatic retrieval of idiomatic expressions from large text corpora. The procedure combines text segmentation techniques and Latent Semantic Analysis (Landauer, Foltz, Laham, 1998). Three indices were computed on the basis of the three-fold hypothesis that a) idiomatic expressions should have few neighbours, that b) idiomatic expressions...
متن کاملEnhanced Phraseological Idiomaticity in Chinese Translational Texts: A Corpus-Based Study of Chinese Four-Character Idioms in Translational and Non-Translational Literal Texts
The aim of a corpus-based approach to the study of Chinese idioms in translational and non-translational texts is to testify the preliminary hypothesis regarding the remarkable use of typical four-character expressions, especially idioms and collocations in Chinese translational texts, which has been conceived and developed largely from my Ph.D. dissertation on a corpus-based study of four-char...
متن کاملStrategies Employed in Translation of Idioms in English Subtitles of Two Persian Television Series
Translation of idioms seems to be complicated for most translators since the meaning of idioms is difficult and sometimes impossible to be deduced from the meaning of their individual components. Considering the difficulties of translation of idioms and also the specific constraints of subtitling such as space and time limits, this research studied the strategies employed in translation of idio...
متن کاملEnhancing an English-Polish Electronic Dictionary for Multiword Expression Research
This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic English-Polish dictionary which contains several hundred formalized idiom descriptions and which has been released under the terms of a free license. In short, the project consists of three phases: cleaning up the dictiona...
متن کامل